ROCm 与 HIP：深入详解的十章教程：GPU 同步性思维模式的转变

高性能计算的根本转变在于，从以 CPU 为中心的串行执行模型，转向一种解耦的生产者-消费者模型，其中 CPU 负责管理流水线，而 GPU 独立运行。核心认知是 GPU 不应被当作严格的同步设备来驱动；将其视为同步设备会形成“停等”瓶颈。

在异步思维模式下，开发者不会等待每个任务完成。相反，他们分配内存，启动内核，并 复制回 通过将非阻塞请求放入硬件队列中，来完成结果的返回。

当主机被迫在每次操作后同步时，执行间隙——即 CPU 与 GPU 之间的传输时间——成为性能的主要瓶颈。通过利用 异步性，CPU 可以继续工作，而 GPU 则并行处理其数据流，从而最大化硬件利用率。

$$\text{总时间} = \max(\text{CPU 工作量}, \text{GPU 工作量}) + \text{同步开销}$$

TERMINALbash — 80x24

> Ready. Click "Run" to execute.

QUESTION 1

Which set of steps correctly converts a synchronous vector-add to use an explicit stream?

Call hipStreamCreate, use hipMemcpyAsync with the handle, and pass the handle as the 4th kernel argument.

Call hipDeviceSynchronize after every kernel launch and use hipMemcpy.

Set the stream parameter to NULL in all hipMemcpyAsync calls.

Replace hipMalloc with hipHostMalloc exclusively.

QUESTION 2

Why is a GPU considered 'not meant to be driven as a strictly synchronous device'?

Because it has no internal clock.

Because waiting for the CPU to confirm every command leaves thousands of cores idle.

Because memory transfers cannot be tracked by the CPU.

Because the GPU must manage its own power state.

QUESTION 3

What is the primary risk of forcing the host to synchronize after every operation?

Memory corruption.

Host-side stalling and loss of hardware saturation.

Increased power consumption on the GPU.

Kernel compile errors.

QUESTION 4

In the logistics warehouse analogy, what does the 'Conveyor Belt' represent?

A HIP Stream.

The GPU Driver.

The CPU Cache.

The VRAM buffer.

QUESTION 5

True or False: hipMemcpyAsync returns control to the CPU before the data transfer is complete.

True

False